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1.0  INTRODUCTION 
1.1  Background 

Army  PM  JCALS,  under  the  direction  of  the  CALS  Test  Network  Office  (CTNO),  has  developed 
Computer- Assisted  Data  Acceptance  (CAD A)  procedures  for  automating  the  acceptance  of  CALS 
raster,  Type  I,  data.  The  objective  of  these  procedures  is  to  reduce  the  labor  requirements  of  the 
currently  used  manual  quality  assurance  (QA)  procedures.  Effective  implementation  of  the 
CADA  procedures  depends  on  the  use  of  reliable  image  quality  analysis  and  identification 
recogniton  techniques  and  tools.  The  QA  procedure  requires  evaluating  the  quality  of  the  image 
data  as  well  as  the  quality  and  accuracy  of  key  identification  data  (ID)  within  the  body  of  the 
image  field.  The  identification  data  such  as  DRAWING  NO.,  SIZE,  REV.,  SHEET  NO.,  etc.  are 
keyed  into  the  header  field  as  ASCII  characters.  This  ID  is  then  used  to  generate  the  index  for 
locating  the  drawings,  hence  it  must  be  correct.  However,  due  to  the  fact  that  the  ID  is  not 
always  accurate,  errors  can  be  made  during  the  preparation  of  the  header  data.  It  is  therefore 
necessary  to  ensure  that  the  header  ASCII  ID  is  the  same  as  the  ID  within  the  image  area  of  the 
drawing.  This  is  done  visually  by  actually  looking  at  the  image  ID  and  the  header  ID. 
Obviously,  manually  viewing  each  image  is  not  only  labor  intensive,  but  also  time  consuming. 
Because  of  the  above  mentioned  facts  alternative  methods  were  investigated  to  automate  the 
verification  of  the  accuracy  of  the  identification  data  within  the  CALS  delivered  engineering 
drawing  image  and  associated  header  areas. 

A  technology  search  within  industry,  government  and  academia  was  initiated  to  investigate 
techniques  that  would  ease  the  labor  intensive  process  described  above.  Pattern  recognition 
techniques,  optical  character  recognition  (OCR)  techniques  and  Artificial  Intelligence  (AI) 
techniques  were  reviewed.  The  OCR  technology  was  tested  from  a  number  of  vendors,  the 
results  proved  that  the  recognition  of  hand  printed  data  could  not  be  successfully  done  at  the 
present  stage  of  OCR  technology  development.  Further,  intelligent  character  recogniton  (ICR) 
techniques  employing  neural  network  technology  were  also  analyzed.  These  results  were  much 
more  acceptable,  especially  in  regards  to  the  quality  of  data  to  be  expected  if  the  CALS 
standards/specifications  were  followed.  The  above  mentioned  findings  are  documented  in  a 
reference  report  entitled,  "Test  Report:  Phase  III  Computer- Assisted  Data  Acceptance"  dated  26 
May,  1992. 

The  first  step  in  the  CADA  identification  recognition  process  is  to  look  for  the  title  block 
contained  in  the  raster  image  of  the  engineering  drawing.  This  entails  locating  the  title  block 
image  area  and  all  of  the  text  fields  (and  their  data)  within  that  area,  and  then  extracting  the  key 
ID  data  from  the  text  fields  for  character  recognition.  This  process  of  searching  for  the  title 
block  area  and  the  extraction  of  the  appropriate  ID  text  data  is  call  preprocessing.  The  testing 
results  pointed  out  the  need  to  include  preprocessing  of  the  key  ID  image  data  within  the  CADA 
software  and  then  deliver  it  to  the  neural  network  engine  for  recognition  processing.  A  number 
of  the  neural  network  products  were  evaluated;  this  report  identifies  those  commercial-off-the- 
shelf  (COTS)  products  that  have  been  recommended  for  use  in  the  CADA  of  CALS  raster  data. 
Additional  information  related  to  the  leasing  costs  and  availability  are  also  included. 
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1.2  Purpose 

This  report  identifies  neural  network  vendors  and  their  ICR  products  suitable  for  possible 
licensing  by  the  Government  when  using  CADA  software  tools  to  accept  CALS  raster  data. 

1.3  Scope 

This  report  provides  a  brief  background  on  the  need  for  automating  the  recognition  and 
acceptance  of  identification  data  within  the  image  and  header  fields  of  raster  data  defined  by 
MIL-STD-1840  and  MIL-R-28002.  The  approach  for  identifying  ICR  vendors,  analyzing  their 
technology,  and  evaluating  their  capabilities  and  costs  have  also  been  included.  Further,  criteria 
necessary  for  the  consideration  of  licensing  vendor  products  by  the  Government  are  also  given. 

1.4  Approach 

Thirteen  OCR  and  ICR  vendor  products  were  reviewed  for  various  levels  of  detail.  The  OCR 
products  did  not  perform  in  the  recognition  of  "hand  printed"  characters.  Additional  testing  was 
performed  on  four  selected  neural  network  ICR  products.  The  detailed  test  results  were 
documented  in  the  reference  report  entitled,  "Test  Report:  Phase  IE  Computer-Assisted  Data 
Acceptance"  dated  26  May,  1992. 

The  approach  taken  in  this  report  is  to  evaluate  the  vendors  products  in  the  following  four  areas: 
platform  dependency,  technical  merits,  support  capability,  and  license  cost  and  considerations. 
The  platforms  considered  for  CADA  were  PCs  and  SUN  Workstations  due  to  the  popularity  of 
these  two  platforms  in  the  current  usage  of  engineering  drawing  systems.  The  technical  merits 
of  the  products  were  based  on  the  results  of  the  tests  which  were  conducted  under  the  above 
mentioned  CADA  activities.  Vendors’  support  capabilities  available  for  the  integration  phase  and 
operational  testing  phase  were  also  considered  in  the  selection.  The  terms  and  conditions  of  the 
candidate  vendors  licensing  agreements  and  the  costs  will  also  be  considered. 

Detailed  vendor  selection  considerations  and  capabilities  are  explained  in  Section  2.0.  Section 
3.0  presents  information  for  other  vendors  evaluated  to  date. 

2.0  SELECTED  VENDOR  INFORMATION 

2.1  Selection  Considerations 

The  Computer-Assisted  Data  Acceptance  (CADA)  procedures  developed  at  the  Army  CALS 
Technology  Center,  under  the  auspices  of  the  CALS  Test  Network  Office  (CTNO),  focus  on  the 
acceptance  of  production  (Level  IE)  engineering  drawing  data.  The  detailed  test  results  were 
documented  in  the  reference  report  mentioned  above  and  dated  26  May,  1992.  The  test  results 
provides  the  basic  information  for  the  selection.  In  addition  to  the  test  results,  the  vendor 
selection  is  also  based  on  the  following  criteria. 

1.  Platform  Dependency.  The  platforms  that  are  considered  for  the  CADA  performance  tests 
are  the  PCs  and  the  SUN  Workstations.  The  choice  of  the  two  platforms  are  based  on  the  current 
availability  of  the  platforms  in  the  existing  engineering  drawing  management  systems  (EDMICS 
and  the  governments  DeskTop  IV  DoD  wide  procurement).  Products  that  need  specialized 
hardware  or  add-on  boards  are  not  considered. 
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The  following  table  2-1  lists  candidate  vendors  tested  and  documented  in  the  Test  Report  for  the 
two  desired  platforms. 


VENDOR’S 

PC 

SUN 

OCR/NTI 

X 

MCC 

X 

Nestor 

X 

VisionShape 

X 

Table  2-1 

(Note:  OCR/NTI  is  planning  to  have  their  product  ported  to  SUN  Workstation.  Further,  Nestor 
has  already  ported  a  version  to  the  PC  and  MCC  is  planning  to  have  a  version  run  on  PC). 

2.  Technical  Merits.  The  technical  merits  of  the  vendors'  products  are  based  on  the  results  of 
the  tests  which  were  conducted  under  the  above  mentioned  CADA  activities.  The  technical 
merits  of  the  tested  products  can  be  summarized  by  two  major  factors: 

•  recognition  capability  based  on  field  recognition  percentage, 

♦  recognition  capability  based  on  character  recognition  percentage. 

3.  Support  Capability.  Vendors’  support  capabilities  for  the  integration  phase  and  fielded 
production  phase  were  evaluated  and  considered  for  the  selection. 

4.  License  Cost  and  Consideration.  The  terms  and  conditions  of  the  candidate  vendors'  licenses 
and  the  product  costs  are  considered  and  the  results  are  shown  in  the  following  sections. 

2.2  Vendors'  Capabilities 

Based  on  the  technical  merits  as  indicated  in  the  Test  Report,  OCR/NTI  has  the  best  recognition 
results  for  PC  and  MCC  has  the  best  recognition  results  for  SUN  Workstation.  In  addition  to 
these  two  companies,  Nestor  has  provided  reasonable  recognition  capability.  Table  2-2  provides 
a  test  summary  of  the  three  companies  selected  for  testing  followed  by  a  description  of  each  of 
the  companies  capabilities  and  product  characteristics. 
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Nestor 


MCC 


General 


1)  flexibility  to  for 
integration  into  CADA 
system; 


2)  speed; 


3)  software  quality  and 
stability 


4)  Application  Program 
Interface  (API) 


Preprocessing  Test 


1)  box  finding  capability; 


2)  text  finding  capability 


3)  key  field  correlation; 


4)  line  and  box  removal 


Character  Recognition 
Test 


1)  character  segmentation 
accuracy; 


2)  character  recognition 
accuracy; 


3)  neural  network 
trainability  for  new 
character  styles 


Postprocessing  Test 
Criteria 


1)  heuristic  capability 


2)  validation  capability 


not  tested 


not  tested 


not  tested 


H 

H 

H 

H 

H 

H 

H 

H 

not  tested 


not  tested 


not  tested 


not  tested 


not  tested 


not  tested 
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2.2.1  OCR  Systems/Non»Linear  Technology  (NTI) 

OCR  Systems,  Inc.  is  the  developer  of  ReadRight  OCR  products.  They  have  also  introduced  a 
version  of  this  software  with  hand-printed  character  recognition  capabilities,  which  was  developed 
in  conjunction  with  Nonlinear  Technologic s(NTI),  Inc.  of  Greenbelt,  MD.  Their  product 
currently  runs  on  PCs  under  Windows  3.0. 

NTI’s  product  performs  reasonably  well  and  at  an  acceptable  speed  without  additional  hardware 
boards  to  accelerate  the  PC.  It  has  the  capability  of  recognizing  hand  written  characters  of 
alphabetic,  numeric  or  mixed  type.  Its  capability  of  distinguishing  mixed  alphabetic  and  numeric 
characters  has  a  better  recognition  accuracy  than  any  other  software  tested.  NTI  uses  line  tracing 
techniques  to  locate  the  boxes  in  the  title  block  area  in  order  to  perform  the  preprocessing 
function. 

2.2.2  Microelectronics  and  Computer  Technology  Corporation  (MCC) 

MCC  used  a  supervised  learning  algorithm  developed  in  their  laboratory  environment  to 
recognize  and  segment  the  hand-printed  alpha  and  numeric  characters  that  overlapped.  Their 
recognition  engine  was  especially  good  at  recognizing  hand  scripted  characters.  They  do  not, 
however,  have  the  capability  to  accomodate  preprocessing.  Therefore,  the  front-end  text  image 
extraction  was  developed  by  Act  Laboratory,  Altamonte  Springs,  FL. 

The  Act  Laboratory’s  preprocessing  algorithm  used  comer  matching  technology.  However,  the 
software  is  stUl  in  a  development  stage,  hence  it  is  not  very  stable  nor  can  all  the  boxes  in  a  title 
block  area  be  accurately  identified.  Therefore,  the  potential  recognition  capability  for  MCC’s 
software  is  not  fully  shown  in  these  test  results  although  they  were  second  in  recognizing 
characters,  behind  the  NTI  product. 

2.2.3  Nestor,  Providence,  RI. 

Nestor  used  an  enhanced  version  of  its  NestorReader™  version  1.0  to  participate  in  the  CADA 
identification  recognition  test.  NestorReader  accepts  binary  images  in  both  TIFF  and  PCX 
formats,  segments  and  recognizes  both  constrained  and  unconstrained  touching  characters. 
Further,  it  has  the  capability  to  do  its  own  image  compression  and  decompression.  The  version 
of  NestorReader  tested  for  this  task  runs  on  the  Sun  Sparcstation.  A  fully  parallel 
implementation  of  the  NestorReader  is  also  available  on  the  INMOS  Transputer™  (TRAM)  - 
a  small  micro-processor  with  dedicated  memory.  Each  TRAM  can  accept  a  full  compressed 
image,  a  zone,  or  a  character. 

Nestor  added  pure  text  extraction  to  the  front-end  of  the  NestorReader  software  for  the 
preprocessing  CADA  tests  so  that  the  identified  candidate  character  string  image  could  be 
recognized  by  the  NestorReader.  Nestor  has  specially  trained  neural  networks  for  different  types 
of  characters  such  as;  machine-printed,  hand-written,  alphabetic,  and  numerics.  Character 
recognition  capability  was  improved  by  adding  additional  segmentation  algorithms  and  further 
neural  network  training  of  the  hand-printed  characters  on  the  engineering  drawings. 
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2.3  Support  Capabilities 

Vendors'  support  capabilities  available  for  the  integration  phase  and  fielded  production  phase 
were  evaluated  and  considered  for  selection.  Both  products  of  OCR/NTI  and  Nestor  are 
distributed  and  supported  through  a  nationwide  network  of  computer  resellers.  MCC’s  product 
is  in  a  laboratory  environment  and  is  now  ready  for  a  commercial  company  (such  as  DEC  and/or 
Texas  Instruments)  to  sell  as  a  product.  Table  2-3  shows  the  support  capabilities  of  the  three 
companies  that  have  passed  the  technical  evaluation  criteria. 


VENDOR’S 

SUPPORT  CAPABILITY 

OCR/NTI 

High 

MCC 

Low 

Nestor 

High 

Table  2-3 

(Note:  The  reason  for  MCC’s  low  grade  support  capability  is  due  to  the  fact  that  their  tool  is 

not  yet  commercially  released.  The  situation  may  improve  once  the  product  is  released). 

2.4  Licensing  Cost  and  Considerations 

The  following  list  shows  licensing  agreements  that  were  requested  from  the  prospective  vendors: 

a.  A  copy  of  the  standard  license  agreement  for  use  of  their  product  to  develop, 
reproduce  and  distribute  executable  programs. 

b.  A  statement  specifically  indicating  what  their  product  deliverable(s)  include(s)  and 
exclude(s):  source  code,  object  files,  object  libraries,  executable  programs, 
executable  libraries,  documentation,  sample  source  code,  utilities,  sample  test  data, 
hardware,  media,  development  support  and  update  support.  The  price  of  each 
deliverable  should  also  be  indicated. 

c.  A  statement  specifically  indicating  what  part  of  their  product  can  or  cannot  be 
distributed  with  executable  programs. 

d.  A  statement  describing  on  what  platforms(s)  executable  programs  can  be 
developed  with  their  product.  For  each  platform  all  hardware  and  software, 
including  the  operating  system,  required  to  run  executable  programs  must  be 
specified. 
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e.  A  statement  of  the  cost(s)  for  a  license  to  distribute  executable  programs 
developed  with  their  product  for  each  of  the  following  scenarios: 

1)  Royalty-free  distribution 

2)  Per-copy  distribution 

3)  Per-user  distribution 

4)  Per-site  distribution 

5)  Per-site  cost  at  a  maximum  of  2  field  sites. 

The  vendors  response  was  to  send  their  standard  schedule  which  is  based  on  per-copy  and 
therefore  not  user  or  site  dependent.  The  MCC  costs  include  preprocessing  so  their  cost  is  much 
higher.  Table  2-4  depicts  the  vendors  license  costs  to  the  government. 


VENDOR’S 

PER-COPY 

PER-10 

COPIES 

PER.20 

COPIES 

PER-50 

COPIES 

ABOVE 

100 

COPIES 

cx:r/nti 

$2,500 

$15,000 

$25,000 

$30,000 

$500  (each) 

Nestor-SUN 

$6,000 

$42,000 

$84,000 

$210,000 

$3,000  (each) 

Nestor-PC 

$1,500 

$10,500 

$21,000 

$52,500 

$825  (each) 

MCC-SUN 

$12,000 

$12,000 

$12,000 

$10,000 

$8,000  (each) 

Table  2-4 


3.0  OTHER  VENDORS  EVALUATED 

The  following  is  the  preliminary  list  of  available  vendors  initially  selected  as  the  potential 
candidates  to  perform  the  identification  recognition  test: 


AT&T,  HNC, 

OCR  systems/Non-Linear  Technology  (NTI),  Nestor, 

VisionShape,  MCC, 

Symbus,  CAERE, 

Calera,  Datacap, 

Ektron,  NYNEX, 

OCRON,  and  Recognitto. 


Some  vendors  were  eliminated  in  the  early  preliminary  test  conducted  in  the  previous  phase  of 
CAD  A  (see  Technical  Report,  Testing  Techniques  for  Data  Acceptance  Procedures,  12  July 
1991).  These  vendors  were  eliminated  due  to  one  or  more  of  the  following  reasons: 

•  did  not  have  ICR  recognition  technology  that  is  necessary  to 
provide  required  recognition  accuracy  to  satisfy  our  needs; 

•  could  not  have  their  product  ready  to  meet  our  test  schedule; 

•  did  not  have  preprocessing  capability  to  utilize  their  ICR 
technology; 
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•  would  not  participate  due  to  lack  of  short  term  incentives;  or 

•  needed  specialized  hardware. 

(NOTE  :  At  a  later  date,  if  the  reason  for  elimination  is  no  longer  true,  it  may  be  worthwhile  to 
review  and  reconsider  these  companies.  For  example,  if  the  preprocessing  capability  can  be 
developed  independent  of  any  vendor  [or  obtained  from  a  vendor  with  unlimited  distribution 
rights  within  the  Government  agencies],  then  those  vendors  that  were  eliminated  before  due  to 
the  lack  of  preprocessing  capabilities  can  be  reconsidered,  if  they  have  good  recognition  engines. 
Also,  if  special  hardware  boards  needed  to  accelerate  the  recognition  speed  becomes  available 
at  a  more  reasonable  cost,  then  the  various  vendors  should  be  reconsidered.) 

4.0  RECOMMENDATIONS 

The  following  recommendations  are  made  based  on  the  overall  assessment  of  the  ICR  technology 
as  well  as  the  preprocessing  and  postprocessing  needed  for  the  ID  recognition. 

1.  It  is  recommended  that  a  commercial-off-the-shelf  (COTS)  recognition  product  be  used 
without  the  preprocessing  and  postprocessing  add-on  capabilities  for  the  CADA  ID  task.  The 
CADA  ID  test  results  which  were  documented  in  the  reference  report  entitled,  "Test  Report: 
Phase  in  Computer-Assisted  Data  Acceptance"  dated  26  May,  1992,  included  the  results  of 
preprocessing,  recognition  and  postprocessing.  It  is  not  clear  which  product  has  the  best 
recognition  engine,  provided  that  the  preprocessing  parts  were  equalized  for  all  vendors.  It  is 
understood,  however,  that  for  each  recognition  package,  some  software  interface  customization 
may  be  needed. 

2.  It  is  recommended  that  the  preprocessing  and  postprocessing  capabilities  be  developed  by  the 
Government  so  that  the  recognition  engine  can  be  more  easily  replaced  from  one  vendor’s  product 
to  that  of  another  vendor’s.  The  reason  is  due  to  the  fast  progress  in  the  ICR  technology  area 
which  will  allow  technology  insertion  into  the  CADA  test  procedure  in  the  future.  It  is  also 
imperative  that  during  the  CADA  test  product  life-cycle  to  keep  track  of  future  products  that  can 
be  used  for  CADA  in  the  future.  To  allow  the  recognition  engine  to  be  replaced  in  the  design 
will  make  the  continuous  improvements  possible  in  this  fast  growing  technology  area. 

3.  It  is  recommended  that  the  OCR/NTI's  product  be  used  as  the  recognition  engine  for  the 
CADA  ID  test  on  the  PC  or  PC  compatible  platforms. 

4.  It  is  recommended  that  the  MCC  and  Nestor  products  be  used  as  the  recognition  engines  for 
the  CADA  ID  test  on  the  SUN  Workstations  platform. 

5.  It  is  recommended  that  if  any  vendor  fails  to  provide  adequate  support  during  the  integration 
or  the  operational  testing  phase,  that  vendor’s  product  be  replaced. 

6.  It  is  recommended  that  when  new  products  for  the  recognition  engines  become  available, 
further  tests  must  be  conducted  to  verify  the  suitability  of  those  products  for  the  CADA  task. 
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5.0  SUMMARY 

Technical  merits  for  the  vendors'  products  were  based  on  the  test  results  conducted  under 
Computer-Assisted  Data  Acceptance  (CADA)  procedures  developed  at  the  Army  CTC  -  the 
CALS  Technology  Center,  under  the  auspices  of  the  CALS  Test  Network  Office  (CTNO),  focus 
on  the  acceptance  of  production  (Level  HI)  engineering  drawing  data.  Other  factors  considered 
for  the  vendor  selection  includes:  platform,  support  capability,  and  licensing  cost  and  agreements. 

It  is  concluded  that  preprocessing  capabilities  will  be  developed  by  the  Government  so  that  the 
recognition  engine  can  be  more  easily  replaced  from  one  vendor's  product  to  that  of  another 
vendor's.  Three  companies  were  selected  for  consideration  regarding  their  products  to  be  used 
in  the  follow-on  integration  phase  and  operational  testing  phase.  Due  to  the  fast  progress  in  the 
ICR  technology  area,  it  is  important  to  allow  technology  insertion  into  the  CADA  test  procedure. 
Therefore,  new  products  will  be  monitored  in  addition  to  the  current  products  on  hand. 
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6.0  LIST  OF  ACRONYMS 


ASCII  American  Standard  Code  for  Information  Interchange 

CADA  Computer-Assisted  Data  Acceptance 

CALS  Computer-aided  Acquisition  and  Logistic  Support 

COTS  Commercial-Off-The-Shelf 

CTC  CALS  Technology  Center 

CTNO  CALS  Test  Network  Office 

ICR  Intelligent  Character  Recognition 

ID  Identification  Data 

OCR  Optical  Character  Recognition 
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